Background Von Willebrand disease (VWD) is in symptomatic male and female patients, often delaying optimal treatment. This study focused on the high unmet need for timely VWD diagnosis using a machine learning (ML) approach and real-world claims data.

Objective To develop a predictive model to identify and characterize symptomatic, undiagnosed and potentially undertreated patients with VWD in the United States.

Methods Data from the Komodo longitudinal US claims databases (January 2015 through March 2020) were used to define 2 cohorts (diagnosed and undiagnosed) from closed claims based on the diagnosis and procedure codes. The diagnosed cohort comprised patients with symptoms and confirmed diagnosis of VWDa, and the undiagnosed cohort included symptomatic undiagnosed persons suspected to have VWDb. ML algorithms were built for male and female patients separately, since the timing and presenting bleeding symptoms are different for each sex. For both cohorts, 80% of the data was used for model training, and 20% was used for model testing. The first step was to identify patient characteristics that predicted VWD diagnosis in the diagnosed cohort (demographics, inpatient and outpatient encounters; claims for diagnosis, procedures, or treatment of bleeding, bleed types, comorbidities, and physician specialty). Four types of ML models were then used to train and select the final algorithms: random forest, neural network, conditional forest, and gradient boosting machine (GBM), separately for males and females. The 2 models (1 in male and 1 in female patients) with the highest accuracy were selected to predict the symptomatic patients with VWD. An 80% cutoff was used to identify the final suspected number of patients with VWD from the undiagnosed cohort. Full patient profiles were constructed for both cohorts.

Results The diagnosed cohort with pre-diagnosis symptoms included 5,981 patients, and the undiagnosed cohort included 4,869,518 persons. The most accurate ML model selected was the random forest model for female patients (overall accuracy, 84%; positive predictive value (PPV), 93%; sensitivity, 73%) and the GBM model for male patients (overall accuracy, 85%; PPV, 92%; sensitivity, 77%). The patient characteristics identified to inform the ML models were applied to the undiagnosed cohort, and 48,902 persons with suspected undiagnosed, symptomatic VWD (28,463 female and 20,439 male patients) were identified. Considering the prevalence of bleed type, product use, diagnostic test, bleeding procedures, and events, profiles were created for suspected patients with VWD (Figure 1).

Conclusions A high-accuracy ML algorithm can likely be used to identify patients with undiagnosed VWD based on the unique characteristics of patients with a confirmed VWD diagnosis. This is the first step in developing an ML algorithm that may support timely diagnosis and treatment of VWD thus improving the quality of life of patients with VWD. Further external validation of the best models for patients with VWD is needed.

aDiagnosed cohort inclusion/exclusion criteria: Patients must have ≥2 confirmed VWD diagnoses or ≥1 VWD diagnosis (Dx) and 1 VWF product prescription (Rx) claim based on the ICD-9 and 10, a minimum of 60 days from the first VWD Dx/VWF Rx to the last VWD Dx/VWF Rx, date of first VWD diagnosis must occur before the first VWD Rx, at least 24 months continuous enrollment prior to the date of initial diagnosis; patients with ≥1 diagnosis of hemophilia A, aortic stenosis, extracorporeal membrane oxygenation (ECMO) or ventricular assist device, and patients without at least 1 bleeding code were excluded. bUndiagnosed cohort inclusion/exclusion criteria: Suspected patients must not have any VWD diagnosis/Rx, hemophilia A/acquired hemophilia, or ≥2 inclusion/exclusion criteria of diagnoses for a general bleeding disorder and must have a history of bleeding or treatment for bleeding within a 24-month timeframe of April 2018 through March 2020.

Sidonio, Jr.:Biomarin: Honoraria; UniQure: Honoraria; Novo Nordisk: Honoraria; Bayer: Honoraria; Guardian Therapeutics: Honoraria; Octapharma: Honoraria, Research Funding; Takeda: Honoraria, Research Funding; Genentech: Honoraria, Research Funding; Pfizer: Honoraria; Spark: Honoraria. Hale:Takeda: Current Employment, Current holder of stock options in a privately-held company. Caicedo:Takeda: Current Employment, Current holder of stock options in a privately-held company. Bullano:Takeda: Current Employment, Current holder of stock options in a privately-held company. Xing:Takeda: Current Employment, Current holder of stock options in a privately-held company.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution